Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAP-850: Support test results as a view #8653

Merged
merged 40 commits into from
Oct 10, 2023

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented Sep 15, 2023

resolves #6914
docs: dbt-labs/docs.getdbt.com#4246

Problem

We now support materialized views (and dynamic tables in Snowflake). These new objects can update without running dbt. However, the test results will stay static if we run them with --store-failures, or just won't exist at all if we don't specify --store-failures. This is philosophically inconsistent behavior.

Solution

Support test results as a view, so that they will always show the current failures without the need to run dbt. We should also support table as a relation type since this is effectively --store-failures. Users can specify this setting in the config block in their tests by setting store_failures_as="view".

Approach:

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare self-assigned this Sep 15, 2023
@cla-bot cla-bot bot added the cla:yes label Sep 15, 2023
@codecov
Copy link

codecov bot commented Sep 16, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (46ee3f3) 86.56% compared to head (8b8f9bb) 86.51%.
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8653      +/-   ##
==========================================
- Coverage   86.56%   86.51%   -0.05%     
==========================================
  Files         176      176              
  Lines       25820    25869      +49     
==========================================
+ Hits        22350    22381      +31     
- Misses       3470     3488      +18     
Flag Coverage Δ
integration 83.26% <100.00%> (-0.14%) ⬇️
unit 65.07% <78.57%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
core/dbt/contracts/graph/model_config.py 92.34% <100.00%> (+0.16%) ⬆️
core/dbt/contracts/relation.py 82.35% <100.00%> (+0.21%) ⬆️
core/dbt/parser/generic_test_builders.py 94.00% <100.00%> (+0.14%) ⬆️

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mikealfare mikealfare marked this pull request as ready for review September 16, 2023 04:55
@mikealfare mikealfare requested review from a team as code owners September 16, 2023 04:55
…, update test to check for relation type and confirm views update
@mikealfare mikealfare added the user docs [docs.getdbt.com] Needs better documentation label Sep 16, 2023
@dbeatty10
Copy link
Contributor

@dbeatty10 @dataders @Fleid @colin-rogers-dbt Please review the test cases in particular to make sure they align with the intention of the feature.

@mikealfare Did you add any test cases that cover the generic tests described in the reprex of "3. Generic tests" here?

@mikealfare
Copy link
Contributor Author

@dbeatty10 @dataders @Fleid @colin-rogers-dbt Please review the test cases in particular to make sure they align with the intention of the feature.

@mikealfare Did you add any test cases that cover the generic tests described in the reprex of "3. Generic tests" here?

I'm working on that today. I wanted to swap the existing state first and get that pushed up.

@dbeatty10
Copy link
Contributor

@colin-rogers-dbt identified an unspecified piece here:

  • Users will be unable to turn off storing failures in the model config if they specify store_failures_as at a higher level

UX to address this: #6914 (comment)

@mikealfare
Copy link
Contributor Author

mikealfare commented Oct 3, 2023

Generic test test cases are in and passing. I merged your updates @dbeatty10 and they addressed this gap.

Users will be unable to turn off storing failures in the model config if they specify store_failures_as at a higher level

This is possible, but not intuitive. I added a test case that demonstrates how to do this. Basically you set store_failures = False and store_failures_as = None:

TEST__NONE_FALSE = """
{{ config(store_failures_as=None, store_failures=False) }}
select *
from {{ ref('chipmunks') }}
where shirt = 'green'
"""

It's run in this line:

"results_turn_off.sql": TEST__NONE_FALSE,

Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, doesn't look like we have covered these with unit tests in the past but worth considering if this change justifies some: https://github.com/dbt-labs/dbt-core/blob/main/tests/unit/test_model_config.py


The intention of this block is to behave as if `store_failures_as` is the only setting,
but still allow for backwards compatibility for `store_failures`.
See https://github.com/dbt-labs/dbt-core/issues/6914 for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@dbeatty10
Copy link
Contributor

This is possible, but not intuitive.

@mikealfare Thanks for explaining how this is possible right now, even it if it isn't intuitive.

We would love to make this intuitive so that it's easier to document and easier to accomplish.

Do you have any concerns with the proposal in #6914 (comment)? It proposes using "ephemeral" to mean "don't actually create any database object".

@colin-rogers-dbt
Copy link
Contributor

@dbeatty10 in terms of the config parameter naming are you comfortable with proceeding with store_failures_as or do you feel we should stick with strategy?

I prefer store_failures_as since it's clearer/more specific but I think it's more important to get this out the door so don't want this to be a blocker

@mikealfare
Copy link
Contributor Author

mikealfare commented Oct 9, 2023

@dbeatty10, I'm comfortable moving forward with "ephemeral"; I'll make those changes and push it up.

I agree with @colin-rogers-dbt that we should stick with store_failures_as. My argument for store_failures_as is that it directly impacts store_failures (in the sense that it takes precedence). This behavior feels more intuitive if the names of the settings are similar. Thoughts @dbeatty10?

And to confirm, is this all that's left for this PR? Or am I missing another issue?

@dbeatty10
Copy link
Contributor

@dbeatty10 in terms of the config parameter naming are you comfortable with proceeding with store_failures_as or do you feel we should stick with strategy?

I prefer store_failures_as since it's clearer/more specific but I think it's more important to get this out the door so don't want this to be a blocker

@colin-rogers-dbt and @mikealfare

My order of preference

  1. Re-use an existing config name that has same meaning (i.e., materialized)
  2. Create a new config name that could be re-used across resource types (like strategy, etc.)
  3. Create a new config name that can only be used for a single resource type (like store_failures_as, etc.)

This order is meant to capture what would be most easy for users.

I totally understand that there may be engineering considerations that could affect one or more of these. If so, would love to learn anything you've uncovered or anything you think should be taken into consideration.

To be clear, I think any of them are workable long-term from a user perspective. They are just ordered from most intuitive to least. Theoretically, an experienced dbt user wouldn't even need to consult the documentation for the first one, but would definitely need to read docs for the last one.

@dbeatty10
Copy link
Contributor

And to confirm, is this all that's left for this PR? Or am I missing another issue?

@mikealfare

Outstanding things to solve

From my perspective, here's the user experience items that we should finalize:

  1. Naming of the config (store_failures_as, strategy, materialized, etc.)
  2. Accepted values of the config (table, view, ephemeral, etc.)
  3. How to behave if the user supplies a non-accepted value (materialized_view, incremental, purple_people, etc)

See above for my take on the first two, and see below for the last one.

How to behave if the user supplies a non-accepted value

It seems like we should raise an error for invalid values. It could be something similar to this:

20:23:35  Completed with 1 error and 0 warnings:
20:23:35  
20:23:35    Compilation Error in test some_test_name_here (tests/some_file_name_here.sql)
  'purple_people' is not a valid value for configuration_name_here. Accepted values are ['table', 'view', and 'ephemeral']

What do you think about raising an error when the config includes an invalid value?

@colin-rogers-dbt
Copy link
Contributor

As @mikealfare described above materialized is problematic in our current implementation so I don't think we should use it here.

Theoretically, an experienced dbt user wouldn't even need to consult the documentation for the first one, but would definitely need to read docs for the last one.

I like this principle a lot but are we concerned about reusing words in different contexts that accept different inputs and mean different things? In this case strategy means one thing when configuring snapshots another in incremental and now would be another for tests.

Also, I don't think we do but any chance we're going to want to store success in the future? (i.e. would we need separate 'strategies' for successful /failing tests?)

How to behave if the user supplies a non-accepted value

+1 a compilation error is what a user should expect here

@mikealfare
Copy link
Contributor Author

  1. Re-use an existing config name that has same meaning (i.e., materialized)

If we could go back and change how tests work, then I agree that materialized makes the most sense here. However that's already taken up with the value (and associated functionality) of test. I could see reworking that over time, but not before the 1.7 rc. The blast radius would be huge and I'd want to get a lot of input before a change like that.

  1. Create a new config name that could be re-used across resource types (like strategy, etc.)

Given this is the next preference, I'll start making these updates. But I do have some questions:

  1. What is meant by a "resource type"?
  2. What is the valid value list for strategy? Would that list change for other resource types?

Regarding the second question, I could see the list getting shorter or longer across resource type (e.g. table/view for one and table/view/ephemeral for another). But I think it would get really hairy if we start accepting different concepts across resource type (e.g. table/view for one and merge/delete+insert for another).

@colin-rogers-dbt got this in while I was writing this up. So to include his piece here, he is highlighting exactly my concern with strategy and provides good context for my second question:

I like this principle a lot but are we concerned about reusing words in different contexts that accept different inputs and mean different things? In this case strategy means one thing when configuring snapshots another in incremental and now would be another for tests.


  1. Naming of the config (store_failures_as, strategy, materialized, etc.)

From the above, it seems like this is strategy.

  1. Accepted values of the config (table, view, ephemeral, etc.)

The valid values are currently any value of RelationType. However, this is implicit and not strictly enforced. We only support table and view for now (and ephemeral after implementing what we mentioned above). Theoretically you could also pass in materialized_view and see what happens, YMMV. That being said, if we're going to raise an exception for invalid values as stated in 3., then it would be the three values you stated.

  1. How to behave if the user supplies a non-accepted value (materialized_view, incremental, purple_people, etc)

I'm sure we don't fail gracefully here as the valid value list is implicit. I'll see if I can throw something in along the lines of your suggestion, but I can't promise that it will cover all cases.

@mikealfare
Copy link
Contributor Author

For those following along at home, ephemeral is now in, as well as the compilation error.

@dbeatty10
Copy link
Contributor

I like this principle a lot but are we concerned about reusing words in different contexts that accept different inputs and mean different things? In this case strategy means one thing when configuring snapshots another in incremental and now would be another for tests.

Definitely hear you both that using strategy would be confusing because it is already in use by incremental materializations.

I'm not attached to the name strategy at all, was just using it as a stand-in for the idea of "Create a new config name that could be re-used across resource types".

By "resource types" (AKA "node types"), I mean these which are listed in the code here.

store_failures_as

In addition to @mikealfare & @colin-rogers-dbt being advocates of store_failures_as, @jtcohen6 stated preference for store_failures_as in dbt internal Slack here. @Fleid also expressed preference for this name in a Zoom call this afternoon.

So while materialized seems like the most simple, elegant, and re-usable config name to me, I will disagree and commit on this one and also support store_failures_as as the name of a new config.

Sorry if this causes any churn for you @mikealfare!

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikealfare

Works as expected

I re-tried the use cases described here in a local environment and they worked 😎

Each of these values all behaved as expected:

store_failures_as: ephemeral
store_failures_as: null
store_failures_as:

I also tried giving invalid values like the following, which also gave appropriate error messages:

store_failures_as: materialized_view
store_failures_as: carrot
store_failures_as: none
store_failures_as: None

One edge case

One discovered edge case was switching from table (or view) to ephemeral between successive dbt runs.

When switching from table to view, the 1st relation in dropped and the 2nd is created. But when switching from table to ephemeral, the table persists afterwards and needs to be manually dropped.

Summary

Regardless of that edge case, approving this PR 👍

We can follow-up in a separate issue to decide what to do when switching from table (or view) to ephemeral.

@mikealfare mikealfare merged commit 3d27483 into main Oct 10, 2023
@mikealfare mikealfare deleted the feature/materialized-tests/adap-850 branch October 10, 2023 21:26
@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#4246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2070] [Feature] Materialized tests
6 participants